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GEOMETRIZATION FOR PATTERN RECOGNITION, 
DATA ANALYSIS, DATA MERGING, 
AND MULTIPLE CRITERIA DECISION MAKING 



5 RELATED APPLICATIONS 

This application claims the benefit of Provisional Patent Application Ser. Nr. 
60/399,122 filed 2002 July 30. This application claims the benefit of Provisional Patent 
Application Ser. Nr. 60/425,729 filed 2002 November 18. This application further relates to 
U.S. Patent Application Ser. Nr. 09/581,949 filed 2000 June 19 and to U.S. Patent 

10 Application Ser. Nr. 09/885,342 filed 2001 June 1 9. 



BACKGROUND OF THE INVENTION 



U.S. Patent Application Ser. Nr. 09/581,949 (hereafter USPA-1) discloses an 
15 energy minimization technique for pattern recognition and classification. In U.S. Patent 
Application Ser. Nr. 09/885,342 (hereafter USPA-2), this energy minimization technique is 
extended to a method for aggregation of ordinal scale data. 



PCT international application number PCT/US98/27374, filed 12/23/1998, and 
20 designating the United States, PCT international application number PCT/US99/08768, filed 
4/21/1991, and designating the United States, U.S. Provisional Patent Application Ser. Nr. 
60/399,122, filed 30/7/2002, and U.S. Provisional Patent Application Ser. Nr. 60/425,729, 
filed 18/1 1/2002, are incorporated herein by reference. The first incorporated application 
discloses an energy minimization technique for classification, pattern recognition, sensor 
25 fusion, data compression, network reconstruction, and signal processing. The incorporated 
application shows a data analyzer/classifier that comprises using a preprocessing step, an 
energy minimization step, and a postprocessing step to analyze and classify data. In a 
particular embodiment, the energy minimization is performed using IDMDS. The second 
application discloses a technique for merging ordinal data. In a particular embodiment, the 
30 merging process is performed using unconditional or matrix conditional, non-metric (ordinal) 
IDMDS. The third incorporated application discloses a modified energy minimization 
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technique for improved and expanded classification, pattern recognition, sensor fusion, data 
compression, network reconstruction, and signal processing. The third application 
additionally discloses a meaningful scale conversion and aggregation process for intermixed 
scale type data. The fourth incorporated application discloses a 2-phase technique for scale 
5 conversion and aggregation of possibly intermixed scale type data. 



SUMMARY OF THE INVENTION 
10 Merging data includes receiving input data for merging, defining one or more 

transformations of the input data, defining a partition of the input data, applying admissible 
geometrization to the one or more transforms of the input data and the partition of the input 
data, producing at least an admissible transformation of the input data, and merging the input 
data using at least the admissible transformation of the input data. 

15 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a diagram illustrating components of an analyzer according to an 
embodiment of the invention. 

20 FIG. 2 is a diagram relating to the use of resampling or replication and aggregation 

with the analyzer according to the embodiment of FIG. 1 . 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 
By way of illustration only, an analyzer, classifier, synthesizer, measuring, and 
25 prioritizing process for data comprises using admissible geometrization with 

quantitative/qualitative/intermixed scale type data will be described and illustrated. The data 
to be analyzed, classified, measured, merged, or prioritized is processed using admissible 
geometrization to produce an element of admissible geometric fit. Using the element of 
admissible geometric fit and optionally other output of admissible geometrization, the data 
30 are analyzed, classified, synthesized, measured, or prioritized. The discussion of one or more 
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embodiments herein is presented only by way of illustration. Nothing shall be taken as a 
limitation on the following claims, which define the scope of the invention. 

The present disclosure relates generally to recognition, classification, measurement, 
5 synthesis, and analysis of patterns in real world entities, events, and processes. It further 
relates to an iterative method for measurement or scale conversion and fusion of data from 
multiple sources and possibly intermixed scale types resulting in a quantitative merged value, 
index, or score. It also relates to an iterative method for multiple criteria decision making 
(MCDM) over mixtures of tangible, objective, quantitative data and intangible, subjective, 
10 qualitative data. 

The present disclosure further extends and improves the techniques disclosed in U.S. 
Patent Application Ser. Nr. 09/581,949 and U.S. Patent Application Ser. Nr. 09/885,342. 
These extensions and improvements include disclosure of a general, and therefore more 

15 useful, procedure for admissible geometrization of data allowing recognition, classification, 
conversion and synthesis of intermixed scale type data and a method for meaningful multiple 
criteria decision making. Additional extensions and improvements of the present disclosure 
can include, but are not limited to, the utilization of arbitrary energy decompositions, double 
data partitions, novel application of optimization constraints, and resampling or averaging 

20 methods for data analysis, smoothing and process invariance. 

The disclosures of USPA-1 and USPA-2 are based on minimization of the energy 
functional 

E(f x ,...,/„ ,X„...,X m )=YZw ijk (f k (c iJk )- d y {X k )} , 

k=\ i<j 

25 

over transformations f k and configurations I^cR N = (R N ,d}, A^-dimensional real 

Euclidean space, subject to the constraints 

X k = ZA k , 
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where Z is a reference configuration and the A k are diagonal matrices. The Wyk are proximity 
weights associated to the raw or initial data values c,y*. 

In USPA-1 and USPA-2, the matrices A k in the constraint equation X k = ZA k are 

5 diagonal. In an embodiment of the present invention, the matrices A k can be arbitrary 

nonsingular and reduced rank transformations of the reference configuration Z. This includes 
the case of diagonal A k and nonsingular matrices A k that can be decomposed as the product of 
a rotation matrix Q k and a diagonal matrix 7*. 

X k =ZA k =ZQ k T k . 

10 

Allowing rotations Q k in the constraint equation improves the rotational invariance of 
embodiments under the present invention as compared to USPA-1 and USPA-2. 

As disclosed in USPA-1, minimization of E with diagonal matrices A k corresponds to 
the INDSCAL model of individual differences multidimensional scaling (IDMDS). 
Minimizing E with the above more general constraints defines the general IDIOSCAL and 
PARAFAC models of multidimensional scaling (MDS) (see de Leeuw, J. and Heiser, W., 
"Theory of multidimensional scaling," in P. R. Krishnaiah and L. N. Kanal, Eds., Handbook 
of Statistics, Vol. 2. North-Holland, New York, 1982). A preferred embodiment of the 
present invention greatly expands the applicability of the INDSCAL, IDIOSCAL, and 
PARAFAC models of IDMDS. 

In addition to the constraints imposed by the above constraint equations, 
embodiments of the present invention make use of internally constraining the reference 
25 configuration Z. These internal constraints consist of holding none, a portion, or all of the 
points in Z fixed during the minimization of the energy functional. 

While the energy E is a mathematical descriptor and does not represent and is not 
intended to represent an actual physical energy, it is intuitively useful to observe that the total 
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energy of an idealized physical network of nodes connected by / massless springs is given by 
the formula 

E =-Yk (L -L )\ 

^spring *y JLj * \ * ei / 5 

where k, is the spring constant, L, the spring length, and L ei the equilibrium spring length for 
5 spring i. The energy functional E is analogous to the spring energy E spring for m coupled 
spring networks. With this interpretation, the initial and fk transformed data values c,y* in the 
energy functional E correspond roughly to the spring lengths I, in E spring . In this way, data 
values can be thought of as spring or edge lengths in data networks or graphs. 

10 The intuitive effect, then, of minimizing E is to allow the simultaneous relaxation of 

multiple (frustrated) data graphs. Embodiments of the present invention greatly expand and 
improve upon the applicability and implementation of data graph relaxation. In particular, 
embodiments of the present invention can include a modified form of the energy functional E 
that extends applicability to more general data sets and analyses. Embodiments of the 

15 present invention also generalize multiple graph relaxation to admissible geometrization with 
respect to non-quadratic, non-least squares objective functions. 

Although picturesque, the above analogy with idealized spring networks does not 
explain how arbitrary data sets are made geometric, tensile or rigidified. Embodiments of the 

20 present invention geometricize or rigidity data through (iterative) admissible geometrization. 
Admissible geometrization of data is broader than energy minimization and includes 
techniques and objective functions qualitatively different from E. In addition, admissible 
geometrization relates to a 2-phase process for explicit model construction for derived 
measurement or conversion of intermixed quantitative/qualitative data. In the following 

25 discussion, use of the single word "geometrization" shall include reference to the longer 
phrase "admissible geometrization." 

Geometrization begins by encoding data elements as the edge weights or "lengths" of 
certain complete graphs (k running over some finite index set). These complete graphs are 
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potentially "flabby" (or "rigid") depending on their mutual relationships and the strength of 
their scale types. Data sets are partitioned twice for admissible geometrization; the first 
partition is used to construct the graphs T*, the second encodes the scale type of sets of Ta 
edge lengths. Unlike USPA-1 and USPA-2, embodiments under the present invention 

5 provide a meaningful tool for analyzing doubly partitioned data and intermixed scale types in 
the graphs F k (admissibility, scale type, meaningfiilness, and other measurement theoretic 
ideas are discussed in more detail below). This not only allows an embodiment under the 
present invention to be used for general scale conversion, data synthesis, and MCDM, but it 
also expands and improves on the disclosures in USPA-1 and USPA-2 for general pattern 

10 recognition, classification, and data analysis. 

To make precise the idea of admissible geometrization, some concepts from the 
representational theory of measurement (RTM) can be referenced. An informal discussion of 
RTM is sufficient for the present discussion. The following discussion follows Narens 
15 (Narens, L., Theories of Meaningfiilness. Lawrence Erlbaum, Mahwah, New Jersey, 2002). 

Since Stevens, it is generally understood that data measurements can be differentiated 
into various qualitative and quantitative classes or scale types. (Stevens, S. S., "On the 
theory of scales of measurement," Science, 103, 1946, pp. 677-680.) 



20 



25 



Let A be a set (A is generally some empirical system of interest). Then a 
measurement or representation of A is a function / from A into a subset R e R of the real 
numbers 

/:A^>RqR. 



The set of all representations for a given set A 9 denoted by S = Hom(^4, R\ is called a 
scale (the notation Hom( ) derives from the formal representational theory of measurement 
where the measurements / are homomorphisms of relational structures). The image of a scale 

S is the set ImS = {/ (x) e A | x e A and f eS} . Let G be a transformation group on IitlS, 
30 that is, G is a group of functions from IrnS to itself with group operation the composition of 
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functions. Then we say that S has scale type G, or G is the scale group of S, if there exists a 
fixed / € S such that 

S = G, = {gof\geG}, 

that is, S is the (induced) G-orbit of / Here the scale S is assumed to be regular and can be 
5 safely ignored in the following discussion. The elements of the scale group G are called 

admissible transformations. In common use are nominal, ordinal, interval, ratio, and absolute 
scale types corresponding to permutation, isotonic, affine, similarity, and trivial admissible 
transformation groups, respectively. 

10 Note that the above groups of admissible transformations satisfy a chain of 

inclusions. These inclusions provide an order on scale types with the weakest scale type or 
measurement level (nominal) corresponding to the largest group of admissible 
transformations (permutations) and the strongest scale type (absolute) associated to the 
smallest (trivial) group of admissible transformations. 

15 

We turn now to the RTM concept of meaningfulness. The basic idea behind 
meaningfulness is that the scale type of a set of measurements puts limitations on the 
conclusions that can be drawn from those measurements. A statement involving scales of 
measurement is said to be meaningful if its truth value is unchanged whenever every scale in 

20 the statement is modified by an admissible transformation. (See Roberts, F. S., "Limitations 
on conclusions using scales of measurement," in S. M. Pollock et al., Eds., Handbooks in OR 
& MS, Vol. 6, Elsevier, New York, 1994.) An example of a meaningless statement is the 
following: "Since it is 40° F today, it is twice as hot as yesterday when it was only 2(P F." 
This statement is meaningless because if we modify the scales in the statement using the 

25 admissible affine transformation C = (5/9)(F- 32) then the statement is false in terms of 
degrees Celsius. 

An embodiment under the present invention can relate, in part, to meaningful 
aggregation. Consider the problem of aggregating ordinal preference ratings P = (3, 3, 3) and 
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Q = (1 5 1 5 4). If we compute the usual arithmetic mean on these two sets of ratings, we find 
the mean of P is greater than the mean of Q. Since we assumed the preference ratings were 
measured on ordinal scales, the above statement about the relative order of the means of P 
and Q should remain true when the ratings are modified by a monotone transformation. If 
5 we apply the (admissible) monotone transformation: l->3,3^4, 4— >7 and compute the 
mean on the transformed data, we discover that the mean of P is now less than the mean of 
Q. Thus the truth value of the statement concerning means of ordinal data is not preserved 
and we conclude that the mean is not a meaningful merging function for ordinal scales. 

10 It turns out that the only meaningful merging function for ordinal data are order 

statistics (see Ovchinnikov, S., "Means of ordered sets," Math. Social ScL, 32, 1996, pp. 39- 
56). Order statistics are also the only meaningful merging functions for mixed qualitative 
and quantitative data since, for closed-form aggregation processes, the scale type of 
intermixed scales are determined by the scale type of the weakest scale (Osborne, D. K. 5 

15 "Further extensions of a theorem of dimensional analysis," J. Math. Psychol., 7, 1970, pp. 
236-242.) 

Real world data tends to be a mixture of different scale types. This is particularly true 
in the social and non-physical sciences, including economics, econometrics, finance, 

20 psychology, and so forth. Commonly used averaging or merging functions such as the 

arithmetic and geometric means are meaningless for intermixed data that includes nominal or 
ordinal scale types. Similarly, standard techniques for MCDM, for example, the analytical 
hierarchy process (AHP) (see Saaty, T. L., The Analytical Hierarchy Process: Planning, 
Priority Setting and Resource Allocation, RWS Publications, Pittsburgh, 1990.), are 

25 meaningless on mixed scale data. Embodiments of the present invention as disclosed herein 
provide an iterative approach to meaningful derived measurement or scale conversion, 
merging, and MCDM on data from qualitative, quantitative, and intermixed 
qualitative/quantitative scales. 
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Embodiments of the present invention offer a further improved method and apparatus 
for classification, pattern recognition, sensor fusion, data compression, network 
reconstruction, signal processing, derived measurement or scale conversion, aggregation of 
intermixed scale type data, and multiple criteria decision making. 

5 

The method and apparatus in accordance with embodiments of the present invention 
provide an analysis tool with many applications. This tool can be used for pattern 
classification, pattern recognition, signal processing, sensor fusion, data compression, 
network reconstruction, measurement, scale conversion or scaling, data synthesis or merging, 

10 indexing, or scoring, multiple criteria decision making, and many other purposes. 

Embodiments of the present invention relate to a general method for data analysis based on 
admissible geometrizatidn. Embodiments of the present invention can use admissible 
geometrization (geometrization) to analyze data. A number of methods for geometrization of 
data have been identified. One embodiment of the invention utilizes a modified form of 

15 individual differences multidimensional scaling (2p-IDMDS) with generalized constraints. 
This embodiment also explicitly utilizes the 2-phase structure of 2p-IDMDS. 

Let C = {C, ,.- 5 C m }be a data set with data objects or cases C k = {c k{ ,...,0^ }and let 

C=PC,, (1) 

20 be a (second) partition of C. (In the following, the letter /, written as a subscript, will indicate 
partition classes C/ of partition (1). The subscript letter k will indicate data objects C*. We 
will from now on also refer to this second partition of C as partition (1) or the (l)-partition.) 
The classes C/ are determined by the user and need not be identical to the data objects C*. It 
is assumed that each class C/ of partition (1) has a definite scale type with scale group G/ of 

25 admissible transformations. 

Embodiments of the present invention can admissibly geometrize the data C. This is 
accomplished by first associating to each data object Q a weighted complete graph The 
weights or edge "lengths" of T* are given by the c*, £ Q and are determined up to admissible 
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transformation by partition (1). More specifically, each edge length c ki belongs to some class 
C\ and hence has scale type G/. Intuitively, we think of the graphs T* as implicitly or 
potentially geometric objects with varying degrees of flabbiness (or rigidity) depending on 
the scale types of their edge lengths as determined by partition (1). By making this geometry 
5 explicit, embodiments of the present invention can discover structure and relationships in the 
data set C. Iln traditional IDMDS, the data elements c 0 k are, in fact, proximities (similarities 
or dissimilarities). In this case, the potential geometry of the graphs F k is closer to the 
surface. Embodiments of the present invention do not require that c ijk be proximity data. In 
this sense, embodiments of the present invention disclose a new admissible length based 
10 encoding of information, which greatly extends the length based encoding disclosed in 
USPA-1 and USPA-2. 

There are number of ways to actualize or make explicitly geometric the potential 
geometry of the graphs T*. One embodiment of the invention utilizes a significantly 
15 modified form of IDMDS, called 2p-IDMDS for 2-partition or 2-phase IDMDS, to 
admissibly geometrize the T k . 2p-IDMDS is based on minimization of the following 
modified energy functional 

E p ig\ >-»>gm>X\ >>»>Xm ) = EE W * fe ( c uk )- d u i X k If > (2) 

*=1 i<j 

subject to the linear constraints 
20 * k =ZA k . (3) 

Zand A* are configurations of points in real Euclidean space = (R N ,d^J , with the usual 

metric d = d iJ9 and the A k are N x N matrices with possible restrictions. The functions g k are 

certain (l)-partition specific mappings defined in terms of admissible transformations gy e G/ 
from the scale group associated to the (l)-partition class C/. (A definition of the §fa is given 
25 below.) 



DRAFT DATE: July 30, 2003 



Docket #1624-3641 



- 11 - 

Minimization of E p with respect to the transformations gi insures that the scale types 
of the (l)-partition classes C\ are preserved. In this way, minimization of (2) defines an 
admissible or meaningful geometric representation 

5 of data graphs by configurations of points A* in R A . 



The constraint equations (3) imply that the embodiment of the invention is a merging 
process. Each complete graph H, or embedded configuration X k , is iteratively merged, and 
thereby deformed, into the reference configuration Z. This embedding, merging, and 

10 deformation respects the scale types of the edge lengths through the admissible 

transformations g/. Differences in deformation between the individual configurations X k 
(graphs T*) and the reference configuration Z are encoded in the matrices A k . For diagonal 
A k , the components of the vector diag(A k ) of diagonal elements of A k are dilations along the 
coordinate axes of R V . Under appropriate identification conditions, the set of dilation vectors 

15 diag(^4) = (diag(^)}, and more generally, the set of deformation matrices A = {A k }, can 
define classification spaces for the data objects C k . 



In addition, norms ||diag(^jt)|| on the space diag(^) can be interpreted as giving the (Z- 
relative) overall sizes of the configurations X k and hence of the graphs T k . We can interpret 

20 the overall size of X k (via A) as the merged value of the data object C k . Since vector norms 
are ratio scale numbers, the process has produced ratio scale merged values from the possibly 
intermixed qualitative/quantitative scales Q. We will see that diag(^^) is generally a 
complex, that is, a list of independent ratio scaled values, unless an identification condition is 
enforced on the matrices A k . In this more general case, the vector norm or magnitude || • || is 

25 not a meaningful (merging) function and we aggregate the elements of dmg(A k ) using other 
synthesizing functions including the determinant det(,4*) on the matrix A k and (weighted) 
geometric mean on the components of diag(A k ). Weights for weighted aggregation can be 
introduced externally or integrated into the geometrization procedure itself as discussed in 
more detail hereafter. 
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Through admissible geometrization, embodiments of the present invention can also 
provide explicit (derived) measurement or scale conversion models from the scale types G/ of 
C/ to interval or ratio scales. Geometrization via minimization of E p contains an iterative 

5 alternating or 2-phase process whereby updated Euclidean distances d 0 (X k ) are fitted to data 
values eft, or transformed values gic ijk ) and then updated transformed values are regressed 
on updated distances. Transformed values are also called pseudo-distances or disparities in 
the IDMDS literature. See Borg, I. and Groenen, P., Modern Multidimensional Scaling: 
Theory and Applications, Springer, New York, 1997. After some convergence criterion has 

10 been reached, the resulting transformed values can be converted to at least (independent) 

interval scales. Often ratio scales can be produced. If desired, the resulting output scales are 
made commensurate. Further mathematical or multivariate statistical manipulation of the 
transformed data is now possible including quantitatively meaningful aggregation using 
standard statistical merging functions and the application of exact statistics and distance 

1 5 function multiresponse permutation techniques. 

Embodiments of the present invention also make use of the above 2-phase process for 
MCDM and prioritization of alternatives measured with respect to qualitative/quantitative 
and intermixed scale types. Further details of these applications are given below. 

20 

One embodiment of the invention implements admissible geometrization through 2p- 
IDMDS. It is based on a 2-partition or entry conditional extension of PROXSCAL, a 
constrained majorization algorithm for traditional IDMDS. (See Commandeur, J. and Heiser, 
W., "Mathematical derivations in the proximity scaling (Proxscal) of symmetric data 
25 matrices," Tech. Report No. 99-93-03, Department of Data Theory, Leiden University, 
Leiden, The Netherlands.) Embodiments of the present invention may be implemented 
using 2-partition or entry conditional extensions of other IDMDS algorithms. In the 
following, tr(^) and^4' denote, respectively, the trace and transpose of the matrix A. 
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Let C = {Cj ,...,C /M } be a data set with data objects or cases C k = {c kl ,...,c kn } and let G/ 

be scale groups for classes C/ from partition (1). The 2p-IDMDS algorithm has eight steps 
with steps 4 and 6 implementing the 2-phase process described above. 

1 . Choose constrained initial configurations X* k . 

2. Find transformations g ( {c ijt ) for fixed distances d fJ [xf ) . 

3. Compute the initial energy 

e p ( g , ,..., gm x ,-X, ) = ZE M v fe fe* )- 4, If . 

4. Compute unconstrained updates ^ of ° using transformed proximities {c ijk )via 
majorization. 

5. Solve a metric projection problem by finding X k minimizing 

h{x x ,...,x m )=^iv{x k -x k )v k (x k -x k ) 

subject to the constraints Xk = ZA^ (Vk are positive semidefinite matrices constructed 
from the weights 

6. Replace X k by and find transformations for fixed distances d^X*}. 

7. Compute 

8. Go to step 4 if the difference between the current and previous values of E p is greater 
than £, some previously defined number. Stop otherwise. 

In steps 3 and 4, the transformations g k are defined in terms of admissible 
transformations gi e G/ as follows 

&taJ=£/(<v) for ^sC t nC /( 
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In (optional) step 2, and in step 6, the admissible transformations g\ are elements of 
the partition (1) scale groups G/ and the notation d 0 (xty means those Euclidean distances 

corresponding to the admissibly transformed data elements gfaji). Various normalizations or 
standardizations can be imposed on the transformed values gi(Ci) or on sets of transformed 
5 values. (Note, g(B) denotes the image of the set B under the mapping g.) For example, the 
union of the transformed values gi(Cj) can be normalized (or made commensurate) in each 
iteration, or the transformed values g/(C/) can be separately normalized in each iteration and 
then the union normalized after convergence. The specific method of normalization may 
depend on the data and on the purpose of the analysis. In traditional IDMDS, normalization 
10 (standardization) is used to avoid the degenerate trivial solution Xk = 0 and g k (C k ) = 0 where 

configurations and associated pseudo-distances are both mapped to zero. In the more general 
setting of 2p-IDMDS, normalization can have other purposes including commensuration 
across combinations of partition classes C/. 

15 If partition (1) is trivial, that is, if there is the only the one class C/ = C, then the above 

2p-IDMDS algorithm corresponds to standard unconditional IDMDS although extended to 
non-proximity data. If the partition classes C\ of (1) are just the data objects Q, and the scale 
groups Gi are the same for all / (£), then the 2p-IDMDS algorithm corresponds to standard 
matrix conditional IDMDS (again, extended to non-proximity data). Otherwise, 2p-IDMDS 

20 is a novel, generalized form of IDMDS. 

The PROXSCAL initialization step 1 is performed under the identity assumption 

x»=x« 2 =... = xl 

25 For certain applications of embodiments of the present invention, this identity assumption 
may be inappropriate. In such cases, step 2 can be skipped or the initial configuration can be 
generated separately from the input data and made to satisfy the constraint equation^ = ZAk 
through an initial metric projection. 
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The solution of the metric projection problem in step 5, is subject to the constraint 
equations X k = There is an indeterminacy in these equations: If Q is an arbitrary 
nonsingular matrix, then 

X k =ZA k =ZQQ~ ] A k =ZA k , 
5 so ZA k is another solution to the constraints. To insure the uniqueness of the solutions to the 
constraint equation an identification condition can be imposed on the matrices One such 
condition is expressed by the formula 

m 

Y j A k A k =m\ N , (4) 

where I,v is the N-dimensional identity matrix. (It is also possible to impose an identity 
10 condition on the reference configuration Z) Imposition of an identification condition such as 
(4) has a number of benefits besides removing the ambiguity in the constraint specification. 
In particular, an identification condition allows the set of matrices .4 = {A k } to be treated as 
part of a matrix classification space and for diagonal A k , the set diag(^) = {diag(^4^)} define 
vectors in an N-dimensional classification vector space. The utility of enforcing an 
1 5 identification condition will be elaborated on further below. 

The 2-phase part of the 2p-IDMDS algorithm is encoded in the initial (optional) step 
2 and then through iteration over steps 4 and 6 until the convergence criteria in step 8 is met. 
We note that in PROXSCAL, the ratio model is fixed in step 2 once and for all. For the 

20 purposes of scale conversion, embodiments of the present invention allow for the update of 
the ratio model with each iteration of the 2p-IDMDS algorithm. It may also be useful to 
define new admissible transformation algorithms for step 6. For instance, (weighted) 
monotone regression as implemented in PROXSCAL is based on means of blocks of order 
violators; certain applications of embodiments of the present invention may be enhanced by 

25 introducing monotone regression with medians on blocks of order violators. 

Step 6 of traditional IDMDS algorithms is called optimal scaling. For ordinal optimal 
scaling, IDMDS algorithms generally distinguish between discrete and continuous data. If 
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the data is continuous, then optimal scaling uses the so-called primary approach to ties where 
ties in the original data are allowed to be broken in the transformed data. In the secondary 
approach to ties, ties are not allowed to be broken and this is intended to reflect the discrete 
nature of the data. In the remainder of this disclosure,, we will assume that the secondary 

5 approach to ties is used in step 6, that is, in the 2-phase portion of 2p-IDMDS . This makes it 
straightforward to construct derived measurement models from 2p-IDMDS transformed data. 
Derived measurement models may also be constructed using the primary approach to ties, but 
additional merging (of untied pseudo-distances) may be used to define a single-valued 
model. In general, the selection of primary or secondary ties depends on the data and 

10 purposes of the analysis. 



2p-IDMDS, through the PROXSCAL algorithm, also allows direct constraints on the 
reference configuration Z. This can include the ability to fix some or all of the points in Z. 
Borrowing from the spring network analogy, fixing coordinates in Z is analogous to pinning 
1 5 some or all of the spring/data network(s) to a rigid frame or substrate. 



FIG. 1 illustrates an operational block diagram of a data 
analysis/classifier/synthesis/measurement/prioritizing tool 100. Tool 100 is a three-step 
process. Step 1 10 is a front end for data preprocessing and transformation. Step 120 is a 
20 process step implementing admissible geometrization — in the presently illustrated 

embodiment, this process step is implemented through the 2p-IDMDS algorithm described 
above. Step 130 is a back end or postprocessing step which organizes, interprets, and 
decodes the output of process step 120. These three steps are illustrated in FIG. 1 . 

25 It is to be understood that the steps forming the tool 100 may be implemented in a 

computer usable medium or in a computer system as computer executable software code. In 
such an embodiment, step 1 10 may be configured as a code, step 120 may be configured as 
second code, and step 130 may be configured as third code, with each code comprising a 
plurality of machine readable steps or operations for performing the specified operations. 

30 While step 110, step 120, and step 130 have been shown as three separate elements, their 
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functionality can be combined and/or distributed. It is to be further understood that 
"medium" is intended to broadly include any suitable medium, including analog or digital, 
hardware or software, now in use or developed in the future. 

5 Step 1 10 of the tool 100 consists of the transformation of the data into matrix form 

and the encoding of partition (1). The matrix transformations for the illustrated embodiment 
can produce nonnegative matrices. The type of transformation used depends on the data to 
be processed and the goal of the analysis. (Note, step 1 10 input data may include modified 
energy weights see equation (2), which can also be written in matrix form. Examples of 

10 such weight matrix encodings follow.) Similarly, the form of the encoding of partition (1) 
can be determined by the data to be processed, its scale type(s), and the goal of the analysis. 
While the data processed in step 1 10 may be proximity data, it is a goal of step 1 10 to 
represent arbitrary forms of data as lengths or proximities. This can be accomplished by 
simply writing the data into some part of one or more symmetric or lower triangular matrices 

15 (symmetric matrices can be assembled from lower triangular matrices). For example, 

sequential data, such as time series, signal processing data, or any data which can be written 
as a list, can be transformed into symmetric matrices by direct substitution into the lower 
(upper) triangle entries of a matrix of sufficient dimensionality. Matrices constructed in this 
manner define complete weighted graphs (possibly with missing weights) where the weights 

20 or edge lengths are the raw data values. In conjunction with the scale type information in 
partition (1), these matrices are interpreted as having potential (admissible) geometry which 
is actualized or explicitly geometricized by the illustrated embodiment of the invention 
through 2p-IDMDS in step 120. 

25 Permutation of direct matrix substitution order may result in different admissible 

geometries. Invariance of tool 100 analyses under rearrangements of substitution order can 
be restored by averaging tool 100 (step 120) over all inequivalent geometries. Approximate 
invariance of tool 100 analyses is achieved by averaging tool 100 (step 120) over a sample or 
subset of inequivalent geometries. This averaging over permutations of substitution orders or 

30 geometries is illustrated in FIGS. 2 and 3. Averaging can be used as well in tool 100 for 
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smoothing metastable configurations X k and matrices A k associated with local minima of the 
energy functional E p and to insure invariance over embedding dimension N. Note, averaging 
here includes a merging technique that is meaningful and appropriate for the given data set. 
This general technique of synthesizing over multiple versions of the same input is referred to 
5 here as resampling or replication. (This terminology should not be confused with the 

statistical method of resampling, though the ideas are similar.) These and related matters are 
discussed in more detail below. 

Step 120 of tool 100 reifies or makes explicit the potential geometry of the matrices 
10 M k from step 110. In illustrated embodiment of the invention, Step 120 admissibly 

geometricizes data via 2p-IDMDS. 2p-IDMDS is based on minimization of the modified 
energy functional E p over geometric configurations X k of step 1 10 matrices M k and partition 
(1) specified admissible transformations, ^-minimal geometric representations or 
configurations satisfy the general constraint equations X* = ZA k where the A k can be identity, 
15 diagonal, reduced rank, or nonsingular matrices. 

Step 130 of the tool 100 consists of visual and analytical methods for organizing, 
presenting, decoding, interpreting, and other postprocessing of output from step 120. The 
output of 2p-IDMDS includes, but is not limited to, decomposition of energy^, transformed 

20 data g£Ci) for / running over partition classes, and deformation matrices A k . (Note, g{B) 
denotes the image of B under the mapping g.) 2p-IDMDS may produce high dimensional 
output benefiting from analytical postprocessing techniques. Some examples of analytical 
techniques are the following: clustering methods, statistical tools and permutation 
procedures, vector space metrics such as norm, trace, and determinant functions, projection 

25 pursuit, and Gaussian and other boundary growing techniques. There are many others. In 
addition, differential coloring of dilation vectors diag(A k ) provides a visual and analytic tool 
for interpretation and decoding of step 120 output including detection of outliers and 
anomalous signals and behaviors. Elements of geometric fit, which for the presently 
illustrated embodiment of the invention include energy decompositions and functions of 

30 energy decompositions, can be utilized for pattern matching and agreement, scoring and 
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ordering, and other data/pattern analyses. Graphs of total modified energy E p against optimal 
embedding dimensionality provide measures of network and dynamical system 
dimensionality. Step 130 of tool 100 also provides methods for organization and 
commensuration of optimally transformed data values. Organized and commensurate 
5 transformed data can be used to define a fixed scale conversion model for non-iterative 
derived scaling of new data, that is, without repeating steps 110 and 120 of tool 100. 
Optimally transformed data values g/(C/) can also be used to determine MCDM priorities. 
These and other applications of tool 100 will be described in detail below. 

10 Let C = {C, } be a data set with data objects or cases C k = {c kx }. Step 1 10 

of tool 100 transforms each Q e C to matrix form A/(Q) = M k where M k is a /^-dimensional 
nonnegative hollow symmetric matrix. (Hollow means diag(M*) = 0 5 the /^-dimensional zero 
vector.) The cases Q can be written to arbitrary p x q matrices Mk (in an alternative 
embodiment discussed later, the matrices Mk are rectangular), however, for clarity of 

15 exposition, the above restrictions are adopted. 



where H P \R~ ) denotes the set of /^-dimensional, nonnegative, hollow, symmetric matrices. 



20 The precise rule(s) for calculating M, including determination of matrix dimensionality p, 
depends on the data C and the purpose of the analysis. 

Since the Mk are nonnegative hollow symmetric matrices, they can be interpreted and 
processed in tool 100 as proximity matrices. In this way, the transformation 



More formally, step 110 may be expressed as a map or transformation 



M:C-+H P (R* 0 ) 
C k ^M k 




25 



C k ^M k 



can be thought of as defining a mapping 
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from cases C* to weighted complete graphs with p vertices or nodes. 

If C contains proximity data, or if proximity data is constructed from C prior to or as 
5 part of the transformation M, then the matrices M k are bonafide proximity matrices. For 
example, if C consists of binary images Q, then Mk may be defined as the distance matrix 
with y-th entry the two dimensional city-block distance between "on" pixels i and j\ 
However, C need not satisfy either of these conditions to be processed by tool 100. 

10 The map Mean be combined with other transformations F to form composite matrix 

encodings (M oF\C k ) . For instance, F could represent the fast Fourier transform on signal 

Ck and Mk = [w,y]* is defined by = | a*/ - ay | with a** = F(c*,) the output magnitudes for 
signal Ck at frequencies i and j. The case where F represents a (geometry altering) 
permutation of the elements of Q is important for scale conversion and synthesis based on 
15 direct substitution matrices Mk and is discussed further below. If the data C are organized in 
tabular form, that is, as a rectangular matrix with rows CV, then a useful transformation is 
F(C)=C the transpose of C. In the context of data mining, this transformation amounts to 
applying tool 100 to data variables or fields instead of data cases or individuals. 

20 If C is not comprised of proximity data, we can still treat it as proximity data through 

direct substitution of data elements c*, e Q into entries of M*. The map Mas direct or entry 
substitution is one approach to preprocessing intermixed measurement level data for tool 100 
based scale conversion, data merging, and MCDM, as well as, general pattern recognition, 
classification, and data analysis. 

25 

For direct substitution of data into matrices Mk it is sufficient to consider only the 
lower triangular portion of Mk (the upper triangle is determined by symmetry). Let 7* = [r,y]* 
be a lower triangular matrix (or the lower triangle of Mk) and define v = max(#Q ), the 
maximum cardinality, #Q, over data sets Q e C. Then for direct substitution, the matrices 
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T k have order F = + Vl + 8v ) ^2~| where \x\ denotes the ceiling function. Fis the 
smallest positive integer satisfying the inequality V{V- l)/2 > v. 

The entries in T k are filled in, from upper left to lower right, column by column, by 
5 reading in the data values of C k which are assumed to be ordered in some consistent manner. 
For example, for data object C k and triangular matrix T k : t 2 \ k = c k \ (the first data value in C* is 
written in the second row, first column of T k ), t^\ k = c k2 (the second data value of Q is written 
in the third row, first column of T k ), t 32k = c*3, t*\ k = c* 4? and so forth. Note, we assume T k is 
hollow, so we set t iik = 0 for all / < V. 

10 

If the number of data values n in some data set C k is less than v, or if strict inequality, 
V(V- l)/2 > v, holds, then the remaining unfilled entries in T k can either be left missing or 
they can be filled in with dummy or augmenting values. (If the entries are left missing, we 
will refer to this as augmenting with missing values). Various ways of augmenting matrices 
15 M k are described in more detail below. Embodiments of the present invention allow 

partitioning and isolation of these augmenting values from actual data values during step 120 
processing. Note, too, that missing values allow tool 100 to be applied to data sets C with 
data objects C k having different numbers of elements; this is the case for both non-proximity 
and proximity data. 

20 

As mentioned earlier, if direct substitution matrix encoding is utilized in step 1 10 of 
tool 100, then any consistently applied permutation of the ordered elements in the C k will 
result in a new input matrix T k with possibly different admissible geometry. (We note that 
the number of geometry altering permutations is less than the total number of possible 
25 permutations on the entries of Q, but this number still grows very rapidly with v.) FIG. 2 
shows the use of tool 100 for resampled or replicated input. Tool 100 may be applied 
according to FIG. 2 to replications over permutations on direct substitution order, to 
replications over some or all 2p-IDMDS embedding dimensions, to replications from 
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multiple 2p-IDMDS random starts, to some combination of the above, or to replications or 
samplings with respect to other 2p-IDMDS inputs or parameters of interest. 

In the case of direct substitution matrix encodings, a permutation invariant output 
5 from tool 100 can be defined by averaging step 120 output, including^ decompositions, 
configurations X* and Z, and matrices A k , over all geometry altering rearrangements on the 
Ck- A completely permutation invariant output is computationally intractable for even 
moderately sized data sets. Still, approximately invariant output can be found by averaging 
over a sample of all possible permutations. The appropriate sample size may be determined 
10 statistically through stability or reliability analysis of replication output. The averaging 

process or function used to synthesize sample (resampled) or replicated output of step 120 of 
tool 100 depends on the input data and purpose of the analysis. 

For specificity, we give some examples of this averaging process; other tool 100 

15 replication and averaging procedures may be easily created by those skilled in the art. We 
assume that tool 100 has been implemented using r samples or replications. Suppose first 
that these replications are over step 1 10 direct substitution orders, then the r replicated 
deformation matrices Akh where the subscript i denotes the /th sample or replication number, 
can be merged by computing separate geometric means on the r replication values for each 

20 entry of the matrices A k h In a second example, we suppose that the A ki are diagonal matrices 
and the goal of the tool 100 analysis is to synthesize the information in data objects Q. This 
can be accomplished by computing norms, ||diag(4^)||, for each data object k and replication 
i, and defining the geometric mean of these r norms on the Mi object to be the merged value 
of the information in C*. If we again suppose we wish to merge the data in objects Q, we 

25 can also compute the centroid of each A^ and then calculate the geometric mean of the r 

centroids for each k. We note that these last two examples include some sort of identification 
condition on the deformation matrices A ki . In general, the goal of the analysis and the data 
analyzed will determine the manner in which replication and aggregation are carried out. In 
particular, depending on the circumstances, it may be possible to perform a calculation of 

30 interest on the /-th replication space first and then combine results over r replications; for 
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other analyses, the classification configurations may be combined first and then the desired 
calculation performed. 

An alternative matrix form Mk which is invariant with respect to consistent 
5 reorderings of the data objects C*. is called ideal node encoding. It consists of writing the list 
Cu to the first column of a (v + 1) x (v + 1) hollow matrix after skipping the first row. It is 
called ideal node encoding because the resulting matrix can be interpreted as representing the 
proximity of n unspecified nodes or embedding points to an ideal node (in terms of complete 
graphs) or ideal point (in terms of configurations). The entries away from the first column 
10 and diagonal of the ideal node matrix can be left missing or filled in, as with direct 

substitution matrix encoding, using augmenting values. This ideal node matrix form is 
applicable to scale conversion, data merging, MCDM, and general data/pattern analysis. 

Step 1 10 of the presently preferred embodiment of tool 100 also includes 
1 5 specification of partition (1 ) of C 

C=PC„ (1) 

/=1 

along with the scale groups or scale types G/ for partition classes C/. The choice of partition 
(1) and scale groups G/ are determined by the data C and specific analysis issues. The actual 
algorithmic encoding of partition (1) can be accomplished through indicator matrices or some 
20 other bookkeeping device and can be implemented readily by one skilled in the art. Inclusion 
of double or 2-partitioning in an embodiment of the invention allows tool 100 to be 
meaningfully extended to heterogeneous, messy, intermixed scale type databases common in 
real world applications. It also increases the flexibility of tool 100 in processing unusual or 
structured matrix forms M*. 

25 

As an example of the latter, we describe briefly a step 1 1 0 hybrid matrix form that is 
assembled using direct substitution and derived proximities. Suppose that the data set C 
consists of both ordinal ratings data and certain proximity data Pk defined as follows. Let 
rank(c*/) denote the rank order of element c*, in the ratings data C*. Define proximities = 
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|rank(c*,) - rank(c*,)| for \<i<j<n. Then the first column of the hybrid matrix M* = 
consists of the ratings C* as in the ideal node form, that is, beneath the zero diagonal the data 
list Ck is substituted directly into the first column of M*. The remaining entries of the hybrid 
matrix (including the diagonal) are filled in or augmented using the absolute rank differences 



To process this data meaningfully, we partition C into ratings C* and proximities P*. with 
isotonic scale group for the ratings Q and similarity scale group for proximities P*. (Other 
partitions might also be meaningful. For instance, the ratings Q (proximities Pk) could be 
10 collected into a single ordinal scale (ratio scale) class and/or the proximities could be 
assigned separately, or collectively, to a weaker scale type.) 

Step 120 in tool 100 is the application of 2p-IDMDS as a 2-partition, 2-phase process 
for admissible geometrization. The matrices Mk and partition related information are input to 

15 the modified PROXSCAL algorithm with additional user supplied settings and specifications 
including embedding dimension N, model or form of the constraint matrices initialization 
method and configuration, direct restrictions, if any, on the reference configuration Z, 
convergence criteria s> 0, and iteration limit. For certain applications, nontrivial weight 
matrices Wk = [W/y]* are also specified. (We will say more about these settings and 

20 specifications in the examples below.) 

The embedding dimension N for admissible geometrization step 120 depends on the 
input data C and the goal of the analysis. For scale conversion (merging) of intermixed scale 
type data, N is often set to the maximum possible value. For direct substitution matrices M*, 
25 we set N = V- 1 . For ideal node matrix forms, N=v+ 1 . Choosing large JV may reduce the 
occurrence of artificially induced lossy compression of data. Large N also mitigates against 
convergence to non-global, local minima. Settings of embedding dimension N less than the 
maximum (the maximum being one less than the order of the matrices Mk) results in 
dimensional reduction of the data. Dimensional reduction is desirable under certain 



5 Pijk, 
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circumstances, for instance, if the data C is known to be (or suspected of being) highly 
correlated or redundant. However correlated or redundant information in C will also be 
automatically expressed in hyperplane or hypersurface restricted configurations Z qR n and 
in tool 100 output classification spaces A = {Ak}. (A purpose of postprocessing step 130 is to 

5 uncover such hypersurface arrangements.) Under certain conditions, an alternative to a fixed 
embedding dimension N is to sum or average step 120 output over all embedding dimensions 
N less than the maximum order of the input matrices M*. This approach to embedding 
dimension via resampling can be used, in particular, when the output of interest are optimal 
transformations gv, optimally transformed data values g/(G), and distances d^Xk). In this 

10 case, summation or averaging over outputs establishes the invariance of tool 100 with respect 
to dimensionality (modulo variations due to local minima and the failure of permutation 
invariance in case direct substitution transformations were used in step 110). Note that 
traditional IDMDS analyses seek low dimensional representations of proximity data. The 
preferred embodiment of tool 100 has no such requirement. 

15 

Step 130, the back end or postprocessing step of tool 100, organizes, decodes, 
interprets, refines, and generally further manipulates the 2p-IDMDS output of step 120. 2p- 
IDMDS output includes (but is not limited to) a reference configuration Z ^ , 
deformation matrices A = {A k }, various decompositions of the modified energy functional E p , 
20 partition dependent optimal transformations g/, optimally transformed data values 

g/(C7), and distances dj/Xk). When sampling or replication is used in step 110 and/or step 120 
of tool 100, there may be multiple outputs to step 130, that is, multiple reference 
configurations Z, multiple sets of deformation matrices A, decompositions of E p , multiple 
partition dependent optimal transformation g*, and so forth. 

25 

The set of deformation matrices A = {Ak} can be interpreted as a classification space 
that reveals structure and relationship between data objects C* e C. If the deformation 
matrices Ak are diagonal, then the set of dimensional dilation values diag(^) = (diag(/4ft)} 
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forms a set of complexes (where, again, a complex is a list of independent ratio scale values). 
Under the identification condition 

m 

£A4=^ (4) 

Ar=i 

the set diag(^) is contained in an TV-dimensional vector space and this space may be 
5 investigated using standard mathematical and statistical tools. The usefulness and generality 
of the sets A and diag(^) is greatly expanded under embodiments of the present invention as 
compared to traditional treatments in IDMDS and non-traditional applications in USPA-1 
and USPA-2. 



10 If preprocessing step 110 consists of direct substitution or ideal node matrices with 

partition (1), then deformation complexes diag(^4) can be used to define a meaningful 
(iterative) merging process 

C,^(diag(4))eR £0 

that assigns a nonnegative ratio scale real number to each data object C*. The function </> 
15 depends on the nature of the set diag(^4) and whether or not an identification condition has 
been imposed on the dilation matrices If an identification condition such as (4) is used in 
step 120, then one possibility is <j> (diag(^*)) = ||diag(i4*)||, the usual L -norm on R (or the 
nonnegative orthant in R^). Other norms or functions could be used, as well. If no 
identification condition is specified, then the complexes diag04*) can be merged using the 
20 (weighted) geometric mean 

/ N yM 
j*(diag(^)) = (*(a, lv ..,^)= f]<C' 

v /=i J 

where wu are predetermined weights and Wk their sum. An alternative to the geometric mean 
is the determinant 

25 *(4) = det(4). 

The determinant can be used to determine the size or volume of general deformation matrices 
Ak. The basic idea in each of the above examples is that the overall size of the deformation 
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matrices Ak can be interpreted as the merged value of the data object C*. In this context, the 
identification condition (4) produces commensurated Ak matrix entries. Because the 
measurement levels of the initial data have been preserved via (l)-partition admissible 
transformations, the above discussion discloses a meaningful scale conversion and merging 

5 process. The merged values are ratio scale magnitudes. In the case of direct substitution 
preprocessing, to insure that the above merging process is symmetric or permutation 
invariant it is necessary to average over all geometry altering rearrangements of the input 
data Q. Since this is computationally intractable for even moderately sized data sets, a 
smaller sample of rearrangements or replications are averaged over resulting in an 

10 approximately symmetric merging process. 

A set of metric or baseline merged values for data set C can be determined by 
applying step 120 of tool 100 to a trivial partition of C with ratio measurement level. 
Comparison of the original merged values of C with the baseline merged values is an 
15 indicator of the degree to which the data set C is amenable to standard statistical aggregation 
techniques. Original tool 100 merged values can also be compared directly to merged values 
from standard statistical aggregation functions such as the arithmetic or geometric mean. In 
addition, statistical measures of variation, scatter, or dispersion of tool 100 merged values 
may be used to determine the degree of coherence or relatedness of the underlying data set C. 



20 



25 



For data/pattern matching, agreement, scoring, ordering, and other data/pattern 
analyses, (functions of) decompositions of the modified energy E p can be used. For example, 
if we let E P k denote the decomposition of E p with respect to data object k, 

= z fe kmc )- ( x k if > 

then the ratio 

\E+-E ' 



F - I pfc p, \ 

V - E 
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is a measure of agreement or matching between data objects k and /, where E p denotes the 
total energy. Another measure of agreement is given by the simple ratio 




5 Step 130 of tool 100 can be configured to process decompositions of E p in many 

ways. 

Let data objects Q be written to direct substitution or ideal node matrices M k with 
partition (1) classes C/ and scale groups G/. Step 130 postprocessing can be applied to step 
10 120 2-phase transformed data values g/(C/) to construct a fixed data conversion or derived 
measurement model. The 2-phase 2p-IDMDS transformed data values are substituted for the 
original raw values cn in partition classes Q. The resulting substitution rule 

C f ->gi{Ct\ 
<v, ->£/(<?/,) 

15 defines a derived measurement or scale conversion model. Nominal, ordinal, and ratio scale 
types are transformed into ratio scales. Interval (affine) scales are mapped to interval scales. 
In this way, the partition classes C/ are converted to independent scales at interval 
measurement levels or stronger. After commensuration or normalization, statistical tools 
meaningful for interval scales can be applied to the converted data. In particular, the derived 

20 measurements can be meaningfully aggregated using the (weighted) arithmetic mean. 

Commensuration or normalization can also be applied on each iteration of the 2p-IDMDS 
algorithm in step 120 of tool 100. The choice of how and when to normalize transformed 
data depends on the data itself and the purpose of the tool 100 analysis. 

25 If direct substitution matrix forms are used in step 1 10, then the above aggregation 

procedure can be made approximately symmetric (invariant) by averaging over a sample of 
geometry altering permutations of matrix entry substitution order. This replication or 
averaging over multiple applications of tool 100 is depicted in FIG. 2. To insure that 
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averaging over samples is meaningful, the transformed values are first normalized or made 
commensurate across replications (this is possible since each partition class in each 
replication has been converted during step 120 to interval scale or stronger.) On the other 
hand, if ideal node matrix forms are used in step 1 10, then the above tool 100 scale 
5 conversion and merging procedure is symmetric (invariant) by construction (this follows 
since proximity matrices are invariant under simultaneous rearrangement of row and column 
orders). 



Data that has been converted using tool 100 as disclosed above can be meaningfully 
10 analyzed or processed further using any statistical or mathematical technique. That is, the 
converted data sets, g/(C/), are interval scale vectors, so are amenable to analysis by any 
statistical or mathematical method which is meaningful on interval scales. 

A measure of the inconsistency of the derived measurements or transformed values 
15 giCi) is given by the decomposition of the modified energy functional E p with respect to the 
partition class Q. This is just the sum of squared residuals between transformed values and 
their associated configuration distances. To insure comparability, the decomposition can be 
divided by the number of elements in the class C/. Scatter diagrams for each partition class 
C/ of pseudo-distances and their associated distances against the initial partition data provide 
20 a graphical representation of the consistency of the derived measurement or scale conversion 
model. (These scatter diagrams are called Shepard diagrams in traditional IDMDS, here, 
however, we have extended the usefulness of these plots beyond the analysis of proximities.) 



The tool 100 scale conversion and merging procedure disclosed above can be adapted 
25 to allow meaningful calculation of priorities for multiple criteria decision making (MCDM). 
The following discussion employs the terminology of the Analytic Hierarchy Process (AHP). 
(See Saaty, T. L., The Analytical Hierarchy Process: Planning, Priority Setting and 
Resource Allocation, RWS Publications, Pittsburgh, 1990.) However, embodiments of the 
present invention are applicable to MCDM independent of AHP or any other MCDM 
30 methodology. 
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Let C = {Q} be sets of pairwise comparisons of preferences between n alternatives 
with respect to m criteria. And let D = {d } d m } denote a set of m weights or priorities, one 

for each set of pairwise comparisons C*. We define step 1 10 lower triangular matrices 7* = 
5 [tij]k = Cyk e C/t, where c f y* indicates the degree of preference for object i over object j with 
respect to criterion L Often the c,y* are assumed to be ratios of weights, = w^/vt^, so that 
dik = 1 . If this is the case, then an additional step is indicated whereby the diagonal elements 
tuk are set equal to zero, 0. We also define constant weight matrices Wk = [w>iy]* where 
Wijk- dk for all \ <j <i<n. C is also partitioned into classes C/ with scale groups G/. The 

10 matrices 7*, and scale groups G/ are submitted to the 2p-IDMDS algorithm in step 120 of 
tool 100 for admissible geometrization. After appropriate step 130 commensuration and 
merging, that is, in accord with the characteristics of the partition classes C/, the merged 
transformed values gi(Cf) form a nonnegative interval or stronger scale matrix (by 
substitution back into the original pairwise comparison matrices) from which priorities for 

15 the alternatives can be derived by computing the principle eigenvector of this matrix. See 
AHP reference above for this and other techniques for computing priorities. The point here 
is that embodiments of the invention can compute priorities on tool 1 00 converted (interval 
or better) scales. 

20 If the data sets Q are composed of scores or ratings for each alternative, rather than 

pairwise preferences, then the Q may be written to ideal node matrices with missing 
value augmentation. Weight matrices Wk are now constructed with first column entries 
below the diagonal equal to dk and remaining entries set equal to one. An appropriate (1)- 
partition of C is determined with classes C\ and scale groups G/. M k , W k , and G/ are 

25 submitted to the 2p-IDMDS algorithm for admissible geometrization. The resulting 
transformed values g/(C/) are, in this case, the decision priorities; no additional matrix 
manipulations are indicated. In this second, score based approach to MCDM, we could also 
have used direct substitution matrices in step 1 1 0 with appropriate modifications to the 
weight matrices W k and partition (1). To provide approximate invariance over substitution 
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order, tool 100 replication over a sample of geometry altering permutations of the raw scores 
or ratings would be performed in accordance with FIG. 2 and our earlier discussions of 
replication. 

5 Yet another approach to prioritizing (hierarchically arranged or clustered) paired 

comparison data using tool 100 is to define a longitudinal partition over the matrices of 
paired comparison preference data. More explicitly, the partition classes would consist of 
same index entries from the (lower triangular portions) of same level or same cluster criterion 
matrices. Priorities can then be found using tool 100 by (1) writing partition classes to ideal 

10 node or direct substitution matrices (step 1 10), (2) applying step 120 to find diagonal 
matrices diag(^4/) and, (3) computing norms, ||diag(4/)||, on the set of diagonal vectors, 
diag(4/), to define priorities. (If an identification condition is not specified, then, as 
described earlier, the determinant or some other meaningful aggregation function can be 
applied instead to meaningfully compute priorities from the complexes diag(^/). Note, here 

15 we are using the subscript / for both data object and (l)-partition; this should not cause any 
confusion.) In step 1 10, we can explicitly include criteria priorities in the form of weight 
matrices (as disclosed above) or criteria priorities can be applied post-hoc to the tool 1 00 
priorities. 

20 An advantage of tool 1 00 for MCDM is that heterogeneous, mixed measurement level 

data may be prioritized directly. This is not the case for other MCDM tools such as the 
Analytical Hierarchy Process that includes homogeneous data and the assumption that 
pairwise comparisons generate ratio scales. 

25 Tool 100 is adaptive or contextual. Changes in a single data element may result in 

global changes in output. Tool 100 can be made progressively less contextual by fixing one 
or more coordinates of the reference configuration Z. This is easily done in the PROXSCAL 
based 2p-IDMDS algorithm. A natural choice in merging applications is to completely fixed 
Z coordinates as the vertices of a centered and normalized (N- l)-simplex in ^/-dimensions. 

30 Fixing Z coordinates leaves only the deformation matrices Ak and admissible transformations 
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gi to be determined in step 120. A second method for decontextualizing tool 100 output is to 
insert fixed reference data objects or landmarks into each data set of interest. After 
processing, these landmarks may be used to standardize results across data sets. A third and 
straightforward option is to simply combine different data sets into a single analysis. This 
5 latter method can also be used for batch mode replication: Instead of processing samples 
separately, they are combined into a single super data set. This super data set is preprocessed 
and input to step 120. Step 120 output can then be analyzed by using average or centroid 
configurations with respect to the replicated data sets. 

10 The processes described above for tool 100 can each be expanded and generalized in 

a number of ways. For example, with the exception of the application of tool 100 to MCDM, 
we have implicitly assumed that the weights w iJk in the modified energy functional E p are 
identically one. In one alternative embodiment, weights may be applied differentially to raw 
and transformed data values. Weights can be assigned a priori or derived from the input data 

15 itself. For example, if we suppose the data C is arranged in tabular or matrix form, then 
applying tool 100 to C?, the transpose of C, associates a weight to each of the original rows 
C k . Specifically, the scale conversion and merging process described above produces a 
scalar, merged value for each row of d which is then used as the nonnegative weight for row 
C k . A scalar value can also be achieved by simply setting the embedding dimension N = 1. 

20 

For each of the tool 100 based merging processes described above, weights can be 
integrated directly into the merging process through the use of nontrivial proximity weights 
Wjjk in equation (2) of step 120. Weights can also be applied in postprocessing step 130 
through weighted statistical merging functions on transformed step 120 output. Which 
25 weighting method is selected depends on the data in question and the purpose of the analysis. 

In another alternative embodiment, in the preprocessing step 110, data C k (matrices 
M k ) can be augmented with artificial values. For example, C k (M k ) may be augmented with 
missing values, repeated constants, or random values. The C k (M k ) may also be augmented 
30 through concatenation of copies of the data values themselves. Augmentation of the C k 
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allows processing of data sets of differing cardinality and missing values. In conjunction 
with (l)-partitioning, augmentation greatly increases the kinds of data that may be processed 
by tool 100. 

5 Minimization of the modified energy function E p is a constrained least squares 

approach to admissible geometrization. While the idea of energy minimization seems 
natural, admissible geometrization does not require a least squares objective function. 
Alternative embodiments have been identified including geometrization based on 
(constrained) least absolute differences, non-dimensional ordinal scaling (see Cunningham, J. 

10 P. and Shepard, R. N., "Monotone mapping for similarities into a general metric space," J. 
Math. Psychol, 11, 1974, 335-363), and nonlinear principle components analysis (or 
Princals, see Gifi, A., "Algorithm descriptions for Anacor, Homals, Princals, and Overals," 
Tech. Report No. RR-89-01, Department of Data Theory, University of Leiden, 1989). 
However, embodiments of the present invention are more flexible and therefore have greater 

15 applicability than either non-dimensional scaling or Princals. L x or least absolute differences 
minimization is generally more difficult to implement than least squares minimization so an 
alternative embodiment of admissible geometrization through constrained l) optimization 
overcomes certain technical programming problems. 

20 To further specify further the method and apparatus in accordance with embodiments 

of the present invention, the following descriptive examples of the application of the 
embodiments of the present invention follow. These examples are illustrative only and shall 
in no way limit the scope of the method or apparatus. 

25 Example A: Data mining 

Suppose company XYZ has an m client database which contains the following fields: 
(1) client age, (2) income (3) region of domicile, (4)-(6) Likert scale responses to survey 
questions concerning company service plan A, and (7) an indicator field showing which XYZ 
30 service plan, B or C, the client is using. Company XYZ has acquired new clients for whom 
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they have information on fields (1) through (6) and they would like to predict which service 
plan, B or C, a new client will select. We apply the three step process of an embodiment of 
the present invention, tool 100. 

5 Let C k = {c kl ,.»>c* 7 }be the record for client k. Then we define m + 1 , 4 x 4 direct 

substitution matrices T k as follows 



f 0 


* 


* 






0 


* 


* 






0 


* 


V C *4 




C*6 


0; 



where the 7 field has been omitted and * denotes empty entries (recall, T k is hollow, lower 
triangular). The first m of these matrices correspond to previous XYZ clients whose 7 th field 

10 values are known. The (m + l)-th matrix represents a new client whose field 7 value is to be 
predicted. We next define a (l)-partition by fields, that is, partition class C/ corresponds to 
field /, for/ = 1,...,6 . Scale groups or scale types are assigned as follows: Giand G2 are 
similarity groups defining ratio scale types; G3 is 2 W , the permutation group on m letters, 
defining nominal scale type; and G4 through G$ are isotonic groups defining ordinal scale 

15 types. (Note, had we assumed that the Likert scales in fields 4-6 were comparable, then we 
could combine partition classes C4 through Ce into a single ordinal scale class.) In this 
hypothetical application of embodiments of the invention, unit proximity weights can be 
assumed. However, if it turned out, for some reason, that age was a universally important 
variable in determining plan selection, one could assign a high value to proximity weight 

20 for each client record L 

Since direct substitution encoding is not invariant under substitution reorderings, we 
create 6! = 720 replications or rearrangements of the above matrices and partitions which will 
be processed in step 120 and averaged over in step 130. (Note, we do not really need to 
25 create 6! replications since 4! of these will not alter the admissible geometry in step 120.) If 
weight matrices are involved, these can be permuted accordingly. 
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In step 120, the m + 1 matrices 7* and admissible transformation information G/, 
/ = 1,...,6, are submitted to the 2p-IDMDS algorithm with the following specifications: (i) 
embedding dimension N = 3, and (ii) the deformation matrices Am in constraint equations (3) 
are diagonal (the INDSCAL model of traditional IDMDS) with identification condition (4) 

5 enforced. We also designate that transformed data values or pseudo-distances are to be 
standardized within, rather than across, partition classes. A number of other technical 2p- 
IDMDS parameters also can be set, for example, one can select to treat the ordinal data from 
fields 4-6 as either continuous or discrete (as mentioned above, this corresponds to so-called 
primary and secondary approaches to ties in IDMDS, though in 2p-IDMDS we can specify 

10 different approaches to ties for each ordinal class C/). We can also decide on the 

convergence criteria, minimum energy E p , and the maximum number of iterations to allow. 

Step 120 is repeated on each of the 720 replications constructed in step 110. The 
output for each of these replications is a set of dilation vectors diag(^4) = {diag(^^)} which, 
15 because of identification condition (4), defines a set of 3-vectors or points in the positive 
orthant of R 3 . These 720 sets of dilation vectors are then averaged by calculating the 
geometric mean over dimensions. We abuse notation and write this averaged set of vectors 
as diag(^), as well. 

20 Step 130 postprocessing is based on statistical clustering analysis of diag(^4), the 

merged classification space of dilation 3-vectors. This is one of a number of ways to analyze 
this and real databases, but it is a very natural approach, as we will discuss shortly. The first 
m of the vectors in diag(^) are divided into two disjoint groups according to their known 
field 7 values. The goal is to predict the unknown field 7 value for the (m + l)-th client 

25 vector using the spatial organization of the set diag(^) in R 3 and the field 7 differential 

marking of the initial m vectors. While there are a number of ways in which this clustering 
analysis can be carried out, a natural choice are multiresponse permutation procedures or 
MRPP (see Mielke, P. W. and Berry, K. J., Permutation Methods: A Distance Function 
Approach, Springer, New York, 2001). MRPP allows classification of an additional object, 

30 in this case, the (m + l)-th client, into one of the two disjoint groups of field 7 distinguished 
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vectors or clients. We will not describe the MRPP methodology here except to point out that 
MRPP, as its name suggests, determines the probability that an additional object belongs to a 
particular group by computing P-values using permutation procedures. In addition, MRPP 
allows for classification with an excess group. The excess group can be used to identify 
5 anomalous objects or outliers in the tool 100 classification space, diag(^4). 

The use of MRPP in the postprocessing step 130 of embodiments of the present 
invention is natural in the sense that MRPP is a model free, (Euclidean) distance function 
approach to statistical analysis and embodiments of the present invention are, among other 
10 things, a model free technique for transforming data, in particular, messy, intermixed scale 
type data into geometric (Euclidean, in the presently preferred embodiment) configurations 
of points. 

The optimal choice for 2p-IDMDS embedding dimension can be found using a 
15 training set of clients with known field 7 values. The most perspicacious dimension may be 
found by back testing the training set holdouts over a range of dimensions. The optimal 
training set dimensions are then used for predicting field 7 values for new clients. 

Example B: Response modeling 

20 

While example A refers to a classification problem, MRPP P- values can be used to 
order any number of objects with respect to many kinds of criteria. It is a simple matter to 
recast example A as a response modeling problem: Let field 7 indicate response or no 
response to a direct marketing campaign. Then the MRPP determined P- values for "new 
25 clients" on the marked classification space, diag(^4), indicate the probability that a person 
(new client) will respond to a solicitation. It is then straightforward to construct a lift table 
from the list of "new clients" sorted by MRPP determined response probability or P- value. 

Example C: Anomaly detection 

30 
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Example A can also be reinterpreted in terms of time series, signals, or sequential 
data. The data objects Q are now data sequences, for example, process traces from a 
computer server. The sequences C* can be processed by tool 100 in precisely the same 
manner as disclosed in example A only now field 7 represents some characteristic or labeling 
5 of the sequence. In the case of process traces this label might indicate whether the given 
process trace represents benign behavior or an intrusion or attack. The (m + l)-th sequence 
or "client" is a monitored process or signal. In this case, MRPP classification of this 
monitored process into an excess group indicates the occurrence of some sort of anomalous 
behavior. The relative size of the associated P-values for excess and non-excess groups 
10 indicate the degree of certainty that anomalous behavior has or is occurring. 

From the foregoing, it can be seen that the illustrated embodiments of the present 
invention provide a method and apparatus for classifying, converting, and merging possibly 
intermixed measurement level input data. Input data are received and formed into one or 

15 more matrices. Furthermore, intermixed measurement level input data is partitioned into 

classes and scale groups. Matrices are processed by 2p-IDMDS to produce decomposition of 
modified energy, deformation matrices, and transformed data values. A back end or 
postprocessing step, organizes, decodes, interprets, and aggregates process step output. The 
technique in accordance with embodiments of the present invention avoids limitations 

20 associated with earlier applications of energy minimization for classification, conversion, and 
aggregation of data, extending these earlier processes to intermixed measurement level data 
and further applications. 



Additional illustrative embodiments of the present invention can apply to voter 
25 preference and grading or scoring of assessment instruments. 



Let C = {Cj ,...,C W } denote a group of m voters and let C k = {c k] ,...,c kn } be the 

preferences of voter k for each of n candidates or choices (large values of c*, correspond to . 
The three-step process of the present embodiment of tool 100 may be used in a number of 
30 ways to determine a group ordering or preference of the n candidates or choices. In one 
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approach, the ordinal preferences of each voter Q are written to direct substitution matrices 
M k with trivial partition C. This preprocessing step 1 10 may be replicated one or more times 
over rearrangements of the substitution order where the number of replications is determined 
by the requirements of the data set C and appropriate statistical reliability analyses. Each 
5 replication is then submitted to the processing step 120 of the presently preferred 

embodiment of tool 100. In step 120, admissibly transformed values or pseudo-distances c ki 
are produced for each voter preference c ki . In one embodiment of the invention, admissibly 
transformed values c ki are found using monotone regression in the 2-phase transformation 

portion of 2p-IDMDS. In step 130 of tool 100, the replicated transformed values c ki are 

10 collected, made commensurate (if indicated by the analysis or data), and merged. The 
merged replicated transformed values are then aggregated by candidate, defining a group 
preference on the set of candidates or choices. 

In an alternative approach, the voter group C is thought of as defining a rectangular m 
15 by n matrix. The rows of the transpose of this matrix are then submitted to the three step 

process described in the preceding paragraph where now the direct substitution matrices are n 
in number, one for each candidate or choice. As in the previous paragraph, the trivial 
partition is selected with possible replication and step 120 processing applied in a manner 
analogous to that described above. In the postprocessing step 130, there are at least two 
20 methods for determining group preferences. The first is similar to the previous description: 
admissibly transformed data are made commensurate (if indicated) and merged across 
replications, then the merged replicated transformed values are merged by candidate where 
now candidate admissibly transformed values are grouped together. In a second approach, 
deformation matrices {A k } are collected from step 120 and are meaningfully averaged or 
25 merged over replications. The merged replication deformation matrices are then measured 
for size, where the matrix function determining the size of the deformation matrices depends 
on the form of the constraint equation X k = ZA k . For example, if the A k are diagonal and 
satisfy an identification condition, then size of the .4* can be defined as ||diag(4*)||, the norm 
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of the vector formed by the diagonal entries of A^ The size of the matrix Ak is interpreted to 
be the group preference for candidate or choice k. 

Embodiments of the present invention can also be applied to grade or score subject 
5 performance on various assessment instruments including standardized tests, aptitude and 
achievement exams, the SAT, graduate record exams, intelligence tests, personality, 
placement, and career inventories, and other instruments. 

In one illustration, let the data set C = {C, ,...,C W } denote a group of m subjects and 
10 the sets C k = {c kl ,-..,c kn } consist of zero/one values with zero (one) indicating an incorrect 
(correct) response by subject k on each of n items or questions in a test or assessment 
instrument. In addition, let W k = {w k] } be proximity weights representing the 

difficulty levels of the n items or questions. (Other information or testing data may be 
encoded in the sets C* including, for instance, human or automatic grader scores on n 
15 questions for individual k. The present embodiment of the invention may be easily adapted 
to these and other data sets by one skilled in the art.) 

The three-step process of the presently preferred embodiment of tool 1 00 may be 
used to determine a test score or grade for each of the above m subjects C* in a number of 

20 ways. In one approach, in step 1 10 of tool 100, the nominal responses of each subject Q are 
written to direct substitution matrices Mm with trivial partition C. (Binary values may also be 
treated as ratio scale type data.). Preprocessing step 1 10 is replicated over rearrangements of 
substitution order of the elements of the subjects Q with the number of replications 
determined by the data set C and the results of statistical analyses. Each replication is then 

25 submitted to step 120 of the presently preferred embodiment of tool 100. In step 120, 

weighted admissibly transformed values or pseudo-distances c ki are found for each subject 

response c&. In the presently preferred embodiment of the invention, the process step 120 
consists of 2p-IDMDS with 2-phase nominal transformations and possibly nontrivial (non- 
unit) proximity weights. In step 130 of tool 100, the replicated transformed values are 
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collected, made commensurate (if indicated by the analysis or data), and merged. The 
merged replicated transformed values are then aggregated by subject defining an overall 
subject grade or test score. In a second approach, deformation matrices {Ak} produced in 
step 120 of tool 100 are meaningfully averaged or merged over replications (for example, 
5 using the dimension-wise geometric mean). The merged replication deformation matrices 
are then measured for size, where the matrix function determining the size of the deformation 
matrices depends on the form of the constraint equation X k = ZA k . For example, if the A k 

(9 

are diagonal and satisfy an identification condition, then the size of the Ak can be defined as 
||diag(4t)||, the norm of the vector formed by the diagonal entries oL4*. The size of the 
10 matrix Ak is interpreted as the grade or test score for subject k. 

Scoring or grading assessment instruments according to the above description of the 
presently preferred embodiment of the invention is contextual or relative. A pool of subjects 
and subject test scores can be maintained against which new subjects may be scored or 
15 graded. More specifically, if B is a set of baseline test subjects, then an individual C* (or 

group C) may be scored against this baseline group by applying the above tool 100 three-step 
procedure to the union C k u B (or Cu B). 

The application of the present embodiment of the invention may be modified to 
20 include proximity weight matrices W k in tool 100 determination of group voter preference or 
choice. In addition, the above voter and assessment analyses can be performed in a 
symmetric, or rearrangement invariant manner, by using ideal node transformation in 
preprocessing step 110. 

25 In general, the admissibly transformed values produced by step 120 of tool 100 may 

be meaningfully processed by a univariate or multivariate statistical technique that is 
meaningful on interval or weaker scale types. In this way, the group preferences or subject 
test scores produced by tool 100, as described above, can be treated as univariate or 
multivariate interval or stronger scale complexes (or vectors if appropriate identification 

30 conditions have been imposed on the constraint equations (4)). 
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While one or more particular embodiments of the present invention have been shown 
5 and described, modifications may be made. As described above, geometrization algorithms 
based on other objective functions may replace 2p-IDMDS. It is therefore intended in the 
appended claims to cover all such changes and modifications that fall within the true spirit 
and scope of the invention. 
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